Improved Sequential Pattern Mining Using an Extended Bitmap Representation
نویسندگان
چکیده
The main challenge of mining sequential patterns is the high processing cost of support counting for large amount of candidate patterns. For solving this problem, SPAM algorithm was proposed in SIGKDD’2002, which utilized a depth-first traversal on the search space combined with a vertical bitmap representation to provide efficient support counting. According to its experimental results, SPAM outperformed the previous works SPADE and PrefixSpan algorithms on large datasets. However, the SPAM algorithm is efficient under the assumption that a huge amount of main memory is available such that its practicability is in question. In this paper, an Improved-version of SPAM algorithm, called I-SPAM, is proposed. By extending the structures of data representation, several heuristic mechanisms are proposed to speed up the efficiency of support counting further. Moreover, the required memory size for storing temporal data during mining process of our method is less than the one needed by SPAM. The experimental results show that I-SPAM can achieve the same magnitude efficiency and even better than SPAM on execution time under about half the maximum memory requirement of SPAM.
منابع مشابه
TKS: Efficient Mining of Top-K Sequential Patterns
Sequential pattern mining is a well-studied data mining task with wide applications. However, fine-tuning the minsup parameter of sequential pattern mining algorithms to generate enough patterns is difficult and timeconsuming. To address this issue, the task of top-k sequential pattern mining has been defined, where k is the number of sequential patterns to be found, and is set by the user. In ...
متن کاملIncremental Mining of Across-streams Sequential Patterns in Multiple Data Streams
Sequential pattern mining is the mining of data sequences for frequent sequential patterns with time sequence, which has a wide application. Data streams are streams of data that arrive at high speed. Due to the limitation of memory capacity and the need of real-time mining, the results of mining need to be updated in real time. Multiple data streams are the simultaneous arrival of a plurality ...
متن کاملA Framework for Mining Closed Sequential Patterns
Sequential pattern mining algorithms developed so far provide better performance for short sequences but are inefficient at mining long sequences, since long sequences generate a large number of frequent subsequences. To efficiently mine long sequences, closed sequential pattern mining algorithms have been developed. These algorithms mine closed sequential patterns which don’t have any super se...
متن کاملAn Advanced Model for Mining Time Interval Sequential Patterns in Stream data
Mainly existing sequential pattern mining algorithms are hard to find out long significant time-interval sequential patterns in information stream. In this paper, we propose a new bitmap-based algorithm of mining d time-interval sequential pattern in information stream called DSBMMS, which is based on binary bit counting and multiple time-interval sequential position. We transform the whole seq...
متن کاملEfficient Sequential Pattern Mining Algorithms
Sequential pattern mining is a heavily researched area in the field of data mining with wide variety of applications. The task of discovering frequent sequences is challenging, because the algorithm needs to process a combinatorially explosive number of possible sequences. Most of the methods dealing with the sequential pattern mining problem are based on the approach of the traditional task of...
متن کامل